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Lysinibacillus sphaericus strain OT4b.31 is a native Colombian strain having no larvicidal activi- 
ty against Culex quinquefasciatus and is widely applied in the bioremediation of heavy-metal 
polluted environments. Strain OT4b.31 was placed between DNA homology groups III and IV. 
By gap-filling and alignment steps, we propose a 4,096,672 bp chromosomal scaffold. The 
whole genome (consisting of 4,856,302 bp long, 94 contigs and 4,846 predicted protein- 
coding sequences) revealed differences in comparison to the L. sphaericus C3-41 genome, such 
as syntenial relationships, prophages and putative mosquitocidal toxins. Sphaericolysin B354, 
the coleopteran toxin Sipl A and heavy metal resistance clusters from nik, ars, czc, cop, chr, czr 
and cad operons were identified. Lysinibacillus sphaericus OT4b.31 has applications not only in 
bioremediation efforts, but also in the biological control of agricultural pests. 
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Introduction 



Biological control of vector-borne diseases, such 
as dengue and malaria, and agricultural pests have 
been an issue of special concern in the recent 
years. Since Kellen et al. [1] first described 
Lysinibacillus sphaericus as an insect pathogen, 
studies have shown mosquitoes to be the major 
target of this bacterium [2-4], but toxic activity 
against other species has also been reported [5,6]. 
L. sphaericus larvicidal toxicity has been reported 
due to vegetative mosquitocidal toxins (Mtx] [7], 
the binary toxin (BinA/BinB] [4], Cry48/Cry49 
toxin [8] and recently the S-layer protein [9]. To 
date, no larvicidal activity has been identified in 
Lysinibacillus sphaericus 0T4b.31 against Culex 
quinquefasciatus [10]. 



have been reported as potential metal 
bioremediators. Strain CBAM5 is resistant to arse- 
nic, up to 200 mM, and contains the arsenate 
reductase gene [15]. L. sphaericus 0T4b.31 
showed heavy metal biosorption in living and 
dead biomass. The S-layer protein was also shown 
to be present [16]. We observed 19 mosquito- 
pathogenic L. sphaericus strains and 6 non- 
pathogenic strains (including 0T4b.31] that were 
able to grow in arsenate, hexavalent chromium 
and/or lead [17]. The moderate heavy metal tol- 
erance in a Lysinibacillus strain isolated from a 
non-polluted environment generates interest in 
characterizing the genomic properties of L. 
sphaericus OT4b.31, in addition to its biotechno- 
logical potential in biological control. 



On the other hand, Lysinibacillus species are po- 
tential candidates for heavy metal bioremediation. 
Some Baciiiaceae strains have been successfully 
isolated from nickel contaminated soil [11], indus- 
trial landfills [12], naturally metalliferous soils 
[13] and a uranium-mining waste pile [14]. In ad- 
dition, native Colombian Lysinibacillus strains 



Here we present a summary classification and a 
set of features for Lysinibacillus sphaericus 
0T4b.31 including previously unreported aspects 
of its phenotype, together with the description of 
the complete genomic sequencing and annotation. 
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Classification and features 

Formerly known as Bacillus sphaericus, the spe- 
cies was defined as having a spherical terminal 
spore and by its inability to ferment sugars [18]. 
According to physiological and phylogenetic 
analysis, it was reassigned to the genus 
Lysinibacillus [19]. Strains of L. sphaericus can be 
divided into five DNA homology groups (I-V]. 
Some mosquito pathogenic strains are allocated 
in subgroup II-A, while Lysinibacillus fusiformis 
species is in subgroup II-B [20]. Later, according 
to 16S rDNA and lipid profile comparisons, 
Lysinibacillus sphaericus sensu lato was classified 
into seven similarity subgroups, of which only 
four retained the previous description by Krych 
et al. [21]. Recently, by using 16S rDNA phyloge- 
netic analysis some mosquito pathogenic native 
strains were found in group II with heterogene- 
ous heavy metal tolerance levels. [17]. 

Partial 16S rRNA gene sequences (1,421 bp] 
were aligned to establish the phylogenetic neigh- 
borhood of Lysinibacillus sphaericus OT4b.31 
(Figure 1). The phylogenetic tree was construct- 
ed by neighbor-joining [23] using the SEAVIEW 
[24] and TreeGraph2 [25] packages. Genetic dis- 
tances were estimated by using the Jukes-Cantor 
model [23]. The stability of relationships was as- 
sessed by bootstrap analysis based on 1,000 
resamplings for the tree topology. Interestingly, 
L. sphaericus OT4b.31 did not fall into any exist- 
ing DNA similarity group; it was found between 
DNA similarity groups III and IV [21]. Consistent 
with Lozano & Dussan [17], L. sphaericus 
OT4b.31 did not fall into DNA similarity groups I, 
II or III. 

Dussan et al. [10] evaluated physiological diversi- 
ty and genetic potential in native Bacillaceae iso- 
lates from highlands of the Colombian Andes, 
where Lysinibacillus sphaericus OT4b.31 was first 
described (Table 1]. L. sphaericus OT4b.31 is an 
aerobic free-living bacterium isolated from cole- 
opteran (beetle] larvae collected in the highlands 
of the Colombian Andes [10]. Vegetative cells 
stain Gram positive, but in sporulating stages, cell 
stain Gram variable (Figure 2]. By using a JEOL 
JSM-5800LV (Japan] scanning electron micro- 
scope, L. sphaericus OT4b.31 is estimated to 
measure 0.61 to 0.65 \im in width and 1.9 to 2.3 
|im long (Figure 3]. L. sphaericus HOT4b.31 



showed slow sporulation rates (undetectable up 
to 40 hours of growth] and positive evidence of 
binary toxin which does not exhibit larvicidal ac- 
tivity against Culex quinquefasciatus [10]. Cul- 
tures grow at 10 to 40°C over a pH range of 6.0 to 
9.0. Antibiotic resistance was evaluated separate- 
ly by adding filter sterilized antibiotic solutions 
in Luria-Bertani broths and checking turbidity 
after 15 hours of growth. L. sphaericus OT4b.31 is 
sensitive to kanamycin (12.5 ng/mL], chloram- 
phenicol (25 ng/mL], erythromycin (5 [ig/mL], 
and gentamicin (25 |ig/mL], while it showed re- 
sistance to trimethoprim/sulfamethoxazol up to 
30 ng/mL/150 ng/mL. 



Genome sequencing information 

Genome project history 

The genome sequencing of Lysinibacillus 
sphaericus OT4b.31 was supported by the CIMIC 
(Centro de Investigaciones Microbiologicas] la- 
boratory at the University of Los Andes within the 
Grant (1204-452-21129] of the Instituto 
Colombiano para el fomento de la Investigation 
Francisco Jose de Caldas. Whole genomic DNA ex- 
traction and bioinformatics analysis was per- 
formed at CIMIC laboratory, whereas libraries 
construction and whole shotgun sequencing at the 
Beijing Genome Institute (BGI] Americas Labora- 
tory (Tai Po, Hong Kong]. The applied pipeline in- 
cluded quality check of reads, de novo assembly, a 
gap-filling step and mapping against a reference 
genome. This whole genome shotgun project has 
been deposited at DDBJ/EMBL/GenBank under 
the accession AQPX00000000. The version de- 
scribed in this paper is the first version, 
AQPX01000000. A summary of the project infor- 
mation is shown in Table 2. 



Growth conditions and DNA isolation 

Lysinibacillus sphaericus strain OT4b.31 was 
grown in nutrient broth for 16 hours at 30 Q C and 
150 rev/min. High molecular weight DNA was iso- 
lated using the EasyDNA® Kit (Carlsbad, CA, USA. 
Invitrogen] as indicated by the manufacturer. DNA 
purity and concentration were determined in a 
NanoDrop spectrophotometer (Wilmington, DE, 
HUSA. Thermo Scientific]. 
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Table 1. Classification and general features of Lysinibacillus sphaericus OT4b.31 according 


to the MIGS recommenda- 


tions [26] 








MIGS ID 


Property 


Term 


Evidence code 3 






Domain Bacteria 


TAS [27] 






Phylum Firmicutes 


TAS [28-30] 






Class Bacilli 


TAS [31,32] 




Current classification 


Order Bacillales 


TAS [33,34] 






Family Bacillaceae 


TAS [33,35] 






Genus Lysinibacillus 


TAS [19,36] 






Species Lysinibacillus sphaericus 


TAS [19,37] 






Type strain OT4b.31 


TAS [10] 




Gram stain 


Positive in vegetative cells, variable in sporulatin| 


I stages I DA 




Cell shape 


Straight rods 


IDA 




Motility 


Non-motile 


IDA 




Sporulation 


Sporulating 


IDA 




Temperature range 


Mesophile, grows > 14°, < 37°C 


TAS [10] 




Optimum temperature 


30°C 


TAS [10] 




Carbon source 


Complex carbohydrates 


TAS [10] 




Energy metabolism 


Heterotroph 


TAS [10] 


MIGS-6 


Habitat 


Coleopteran (beetle) larvae 


TAS [10] 


MIGS-6.3 


Salinity 


Growth in Luria-Bertani broth (5% NaCI) 


IDA 


MIGS-22 


Oxygen requirement 


Aerobic 


TAS [10] 


MIGS-15 


Biotic relationship 


Free living 


TAS [10] 


MIGS-14 


Pathogenicity 


Unknown 


TAS [10] 


MIGS-4 


Geographic location 


Tenjo, Cundinamarca, Colombia 


TAS [10] 


MIGS-5 


Sample collection time 


1995 


TAS [10] 


MIGS-4.1 


Latitude 


4.88727 


TAS [10] 


MIGS-4.2 


Longitude 


-74.132831 




MIGS-4.3 


Depth 


Surface 


TAS [10] 


MIGS-4.4 


Altitude 


2,685 m above sea level 


TAS [10] 



a) Evidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the 
literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based 
on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene 
Ontology project [38]. 
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VI 
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Bacillus subtilis ** 



Geobacillus stearothermophilus 
Alicyclobacillus cycloheptanicus 



i i i i 



0.0 



0.03 



Figure 1. Phylogenetic tree highlighting the position of Lysinibacillus sphaericus OT4b.31 relative to the available 
type strains and other non-assigned species within the families Alicyclobacillaceae and Bacillaceae. Alicyclobacillus 
cycloheptanicus was designated as the outgroup species for the analyses. Right brackets encompass each homolo- 
gy group (l-VII) according to Nakamura's benchmarks [21]. Nucleotide sequences obtained from GenBank and 
used in the phylogenetic analyses were as follows: Alicyclobacillus cycloheptanicus 1457 (X51928), Geobacillus 
stearothermophilus 10 (X57309), Bacillus subtilis 168- (X60646), Bacillus licheniformis DSM 13 (X68416), Bacillus 
megaterium IAM 13418' (D16273), Bacillus sp. BD-87 (AF1 69520), Bacillus sp. BD-99 (AF1 69525), Bacillus sp. 
NRS-1691 (AF1 69531), Bacillus sp. NRS-1693 (AF1 69533), Solibacillus silvestris Stl_B046 (NR_074954), 
Lysinibacillus massiliensis 4400831 (NR_043092), Bacillus sp. NRS-250 (AF1 69536), Bacillus sp. B-1876 
(AF1 69494), Bacillus sp. NRS-1198 (AF1 69528), Bacillus sp. B-4297 (AF1 69507), Bacillus sp. NRS-111 
(AF1 69526), Lysinibacillus sphaericus OT4b.31 (AQPX00000042.1 :91 -1 546), Bacillus sp. B-183 (AF1 69493), 
Lysinibacillus sphaericus B-23268' (AF1 69495), Lysinibacillus sphaericus JG-A12 (AM292655), Bacillus sp. B-14905 
(AF1 69491), Lysinibacillus sphaericus ZC1 (NZ_ADJR01 000054.1 :1 -1 487), Lysinibacillus sphaericus C3-41 
(NC_01 0382.1:1 6887-1 8287), Bacillus sp. B-14865 (AF169490), Lysinibacillus sphaericus 2362 (L14011), 
Lysinibacillus fusiformis ATCC-7055 (AJ310083), Bacillus sp. B-14957 (AF1 69492) and Bacillus sp. B-23269 
(AF1 69496). The branches are scaled in terms of the expected number of substitutions per site. Numbers adjacent 
to the branches represent percentage bootstrap values based on 1,000 replicates. Lineages with type strain genome 
sequencing projects registered in GOLD [22] are labeled with one asterisk, those also listed as 'Complete and 
Published' with two asterisks. 
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Figure 2. Gram staining of (A) vegetative cells and (B) spores of Lysinibacillus sphaericus OT4b.31 . 



Genome sequencing and assembly 

After DNA extraction, samples were sent to the Bei- 
jing Genome Institute (BGI] Americas Laboratory 
(Tai Po, Hong Kong]. Purified DNA sample was first 
sheared into smaller fragments with a desired size 
by a Covaris E210 ultrasonicator. Then the over- 
hangs resulting from fragmentation were convert- 
ed into blunt ends by using T4 DNA polymerase, 
Klenow Fragment and T4 polynucleotide kinase. 
After adding an "A" base to the 3' end of the blunt 
phosphorylated DNA fragments, adapters were 



ligated to the ends of the DNA fragments. The de- 
sired fragments were purified though gel- 
electrophoresis, then selectively enriched and am- 
plified by PCR. The index tag was introduced into 
the adapter at the PCR stage as appropriate, and a 
library quality test was performed. Lastly, qualified, 
short, paired-ends of 90:90 bp length with 500 bp 
insert libraries were used to cluster preparation 
and to conduct whole-shotgun sequencing in 
Illumina Hi-Seq 2000 sequencers. 




Figure 3. Scanning electron micrograph of Lysinibacillus sphaericus OT4b.31 at an operat- 
ing voltage of 20 kV. 
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Using the FASTX-Toolkit version 0.6.1 [39] and 
clean_reads version 0.2.3 from the ngs_backbone 
pipeline [40] reads were trimmed and quality fil- 
tered. Then, with the CLC Assembly Cell version 
4.0.10 [41], assembly and scaffolding steps were 
conducted via a de novo assembly pipeline. The 
assembly included automatic scaffolding and k- 
mer/overlapping optimization steps. Some gaps 
were successfully filled by using GapFiller [42] 
within 30 iterations. No more gaps reached con- 
vergence by running more iterations. To obtain 
structural insight of a chromosomal scaffold, we 
used CONTIGuator.2 [43], using the Lysinibacillus 
sphaericus strain C3-41 chromosome (accession 
number: CP000817.1] as reference. Gap-filling 
steps and mapping to reference sequences were 
performed again to confirm convergence. Quality 
assessment of the assembly was performed with 
iCORN [44]. The error rate of the final assembly is 
less than 1 in 1,000,000. Lastly, by using PROmer 
from the MUMmer [45] and Mauve [46] packages, 
we compared the chromosomal assembly and the 
chromosome of L. sphaericus C3-41. 

Genome annotation 

The Glimmer 3 gene finder was used to identify 
and extract sequences for potential coding re- 
gions. To achieve the functional annotation steps, 
the RAST server [47] and Blast2GO pipelines [48] 
were used. Blast2GO performed the blasting, GO- 
mapping and annotation steps; which included a 



Table 2. Genome sequencing project information 



MIGS ID 


Property 


Term 


MIGS-31 


Finishing quality 


Improved high-quality draft 


MIGS-28 


Libraries used 


One paired end tags 90:90 bp with 500 bp insert 


MIGS-29 


Sequencing platforms 


lllumina Hi-Seq 2000 


MIGS-31. 2 


Fold coverage 


100x 


MIGS-30 


Assemblers 


CLC Assembly Cell version 4.0.10 


MIGS-32 


Gene calling method 


Glimmer3, tRNAscan-SE 




Genbank ID 


AQPX00000000 




Genbank Date of Release 


May 10, 2013 




GOLD ID 


Gi39289 




Project relevance 


Biotechnology, metabolic pathway 



description according to the ProDom, 
FingerPRINTScan, PIR-PSD, Pfam, TIRGfam, 
PROSITE, ProDom, SMART, SuperFamily, Pattern, 
Gene3D, PANTHER, SignallP and TM-HMM data- 
bases. The results were summarized with InterPro 
[49]. Additionally, a GO-EnzymeCode mapping 
step was used to retrieve KEGG pathway-maps. 
tRNA genes were identified by using tRNAscan-SE 
[50] and rRNA genes by using RNAmmer [51]. The 
possible orthologs of the genome were identified 
based on the COG database and classified accord- 
ingly [52]. Prophage region prediction was also 
conducted by using the PHAST tool [53]. 

Genome properties 

The genome summary and statistics are provided 
in Tables 3 and 4 and Figure 4. The genome con- 
sists of 96 scaffolds in 4,856,302 bp total size with 
a GC content of 37.5%. A total of 23 scaffolds were 
successfully aligned to a reference sequence, com- 
prising 4,096,672 bp of sequence and are repre- 
sented by the red and blue bars within the outer 
ring of Figure 4. Of the 4,938 genes predicted, 
4,846 were protein-coding genes, 46 RNAs, and 
1,623 pseudogenes were identified. Genes as- 
signed a putative function comprised 67.13% of 
the protein-coding genes while the remaining 
ones were annotated as hypothetical proteins. The 
distribution of genes into COGs functional catego- 
ries is presented in Table 5. 
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Table 3. Summary of genome 

Label Size (Mb) Topology INSDC identifier 

Chromosomal scaffold 4,096,672 Circular KB933398.1 
Extrachromosomal elements 759,630 Linear KB933399.1 -KB933469.1 



Table 4. Nucleotide content and gene count 


levels of the ; 


genome 


Attribute 


Value 


% of total 3 


Genome size (bp) 


4,856,302 


100.00 


DNA GC content (bp) 


1,821,262 


37.50 


DNA coding region (bp) 


3,924,297 


80.81 


Number of replicons 


1 




Extrachromosomal 


0 




Total genes 


4,938 


100 


RNA genes 


46 


0.93 


rRNA operons 


7 




tRNA genes 


38 


0.77 


Pseudogenes 


1,623 


32.87 


Protein-coding genes 


4,846 


98.14 


Genes in paralog clusters 


658 


13.33 


Genes assigned to COGs 


2,946 


59.66 


1 or more conserved domains 


2,946 


59.66 


2 or more conserved domains 


529 


10.71 


3 or more conserved domains 


98 


1.98 


Genes with function prediction 


3,315 


67.13 


Genes assigned Pfam domains 


2,799 


56.68 


Genes with signal peptides 


1,206 


24.42 


Genes with transmembrane helices 


1,206 


24.42 


CRISPR repeats 


0 


0.00 



a) The total is based on either the size of the genome in base pairs or the 
total number of protein coding genes in the annotated genome. 
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Table 5. Number of genes 


associated with the 25 general COG functional categories 




VtIiip 


/odlic 


Description 


1 

J 


1 fin 


i fin 
J .ou 


Translation 


A 
A 


I I o 


3 /I Q 

z.4y 


RNA processing and modification 




~IQ A 


"7 A fl 
/ .40 


Transcription 


1 

L 


I 0/ 


"3 Q "3 
O.JO 


Replication, recombination and repair 


D 
D 


1 
1 


n m 
U.UZ 


Chromatin structure and dynamics 


Pl 

U 


J / 


n 7fi 
U./o 


Cell cycle control, mitosis and meiosis 


v 
Y 


U 


U 


Nuclear structure 


V 


/ D 


1 'if! 
1 .30 


Defense mechanisms 


T 
1 


zyo 


d. i y 


Signal transduction mechanisms 


KA 
/VI 




J. 3D 


Cell wall/membrane biogenesis 


M 
IN 


Q q 

y a 


3 m 

Z.U I 


Cell motility 


Z. 


j i 


n Aft 

U.OO 


Cytoskeleton 


V V 


? fl 
z o 


U. 3:7 


Extracellular structures 


1 i 

U 


A fl 
40 


1 m 


Intracellular trafficking and secretion 


(~\ 


y d 


3 n^ 


Posttranslational modification, protein turnover, chaperones 


c 


i oy 


3.3/ 


Energy production and conversion 






^ OQ 
3 .\Jz) 


f—, 111.. ■ 1 .11' 

Carbohydrate transport and metabolism 


t 




7 A") 
/ .4Z 


A ■ 1 . . 1 ill" 

Ammo acid transport and metabolism 


r 


OJ 


I .OU 


I.* 1 . * l ■ ■ 1 .11* 

Nucleotide transport and metabolism 


ui 
n 


1 /I 3 
I 4Z 


i nn 

3.UU 


Coenzyme transport and metabolism 


i 


1 33 


3 f! 1 

z.o I 


Lipid transport and metabolism 


n 

r 


1 "7 0 

2/ 3 


IT "7"7 

3.// 


Inorganic ion transport and metabolism 


Q 


98 


2.07 


Secondary metabolites biosynthesis, transport and catabolism 


R 


450 


9.51 


General function prediction only 


S 


234 


4.95 


Function unknown 




1,694 


37.74 


Not in COGs 



a) The total is based on the total number of protein coding genes in the annotated genome. 



Insights into the genome 

To complete the assembly process, a resequencing 
pipeline was applied that set whole genome se- 
quences as references such as Lysinibacillus 
sphaericus C3-41, Bacillus sp. strain B-14905, Bacil- 
lus sp. NRRL B-14911, Bacillus megaterium QM 
B1551, Bacillus anthracis Ames, Lysinibacillus 
boronitolerans F1182 and Lysinibacillus fusiformis 
ZC1. Mapping coverage was lower than 30% in any 
case (data not shown]. In addition, GC content, and 
depth-GC correlation analysis demonstrated nei- 
ther a biased distribution nor heterogeneity in the 
GC content of raw data. Thus, a de novo assembly 
was conducted in the CLC Assembly Cell version 
4.0.10, as discussed above, resulting in a 123- 
scaffold assembly with a N50=96,816 bp. After the 
gap-filling step, all intrascaffold gaps and 29 
interscaffold gaps were closed, leaving 94 scaffolds 



with a N50=205,086 bp. Finally, a mapping step 
was conducted using the sequences mentioned 
above as references. This yielded 26 supercontigs 
that mapped to L. sphaericus strain C3-41 chromo- 
some corresponding to 88.9% of the reference 
chromosome. This alignment was proposed as a 
chromosomal scaffold. Other reference sequences 
lead to no significant coverage levels and 
extrachromosomal scaffolds did not align to previ- 
ously sequenced plasmids of related species (data 
not shown). Chromosomal comparison from the 
PROmer analysis between L. sphaericus strains 
0T4b.31 and C3-41 showed that most of the two 
chromosomes mapped onto each other, revealing 
large segments of high similarity (Figure 5]. How- 
ever, a region comprising around 2 to 3.25 Mbp in 
the C3-41 chromosome and the contigs 15 to 19 in 
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the chromosomal scaffold were remarkably scat- 
tered in the dot-plot, revealing low coverage levels 
and different syntenial relationships to the refer- 
ence sequence. 

The origin of replication of the chromosome of L. 
sphaericus OT4b.31 was estimated by similarities 
to several features of the corresponding regions in 
L. sphaericus C3-41, Bacillus sp. B-14905 and other 
close related bacteria, including colocalization of 
the genes: dnaX, recR, hoIB, dnaA, recG and recA; 
and GC nucleotide skew [(G-C]/(G+C]] analysis. In 
the first 40 Kbp of contig 1, we found dnaX, recR, 
and hoIB, while dnaA, recG and recA were found at 
the end (after 290 Kbp] of contig 13. This may sug- 
gest that contig 13 should be allocated immediately 
before contig 1. Besides, there was no evidence of 
multiple dnaA boxes around the potential origin. 



The replication termination site of the chromoso- 
mal scaffold is believed to be localized near 2.5 
Mbp in the contig 18, according to GC skew analy- 
sis, and the coding bias for the two strands of the 
chromosome is for the majority of CDSs to be on 
the outer strand from 0 to ~2.5 Mbp and on the 
inner strand from ~2.5 Mbp to the end of the 
chromosomal scaffold (contig 26, Figure 4]. This 
was also confirmed by the presence of parC 
(H131.12178) and parE (H131_12183], which en- 
code the subunits of the chromosome-partitioning 
enzyme topoisomerase IV [54]. Similar to the L. 
sphaericus C3-41 genome [55], we did not find the 
homolog of rtp (replication terminator protein- 
encoding gene] in the chromosomal assembly of 
OT4b.31. 



4800000 0 




Figure 4. Graphical map of the genome. From outside to the center: Ordered and oriented scaffolds assigned 
to chromosome in blue and red, extrachromosomal scaffolds in orange and black, Genes on forward strand 
(color by COG categories), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, 
rRNAs gray), GC content and GC skew. 
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Figure 5. (A) Dot-plot of amino-acid-based alignment of a 4.09 Mbp chromosomal scaffold of L. sphaericus 
OT4b.31 (y-axis) to a 4.6 Mbp chromosome of L. sphaericus C3-41 (x-axis). Aligned segments are represented as 
dots or lines. Forward matches are plotted in red, reverse matches in blue. Figure generated by PROmer [45]. (B) 
Nucleotide-based alignment of a 4.09 Mbp chromosomal scaffold of L. sphaericus OT4b.31 (right) to a 4.6 Mbp 
chromosome of L. sphaericus C3-41 (left). A total of 27 homologous blocks are shown as identically colored re- 
gions and linked across the sequences. Regions that are inverted relative to L. sphaericus OT4b.31 are shifted to 
the right of center axis of the sequence. The origin of replication in each sequence is approximately at coordi- 
nate 1. Red bars show the limits of each contig in the chromosomal scaffold. Contigs 1 to 26 are numbered in 
ascending order start in coordinate 1 . The figure was generated by Mauve [46]. 



A total of 42 hypothetical protein coding sequenc- 
es were assigned as putative transposable ele- 
ments, with the most frequent families being IS66, 
IS110, IS1272 and IS3. In addition, five prophage 
regions were identified, of which one region is in- 
tact and 4 regions are incomplete. Lactobacillus 
phage C5 (intact], Bacillus phage cpl05, Clostridi- 
um phage c-st, Bacillus Phage SPP1 and Bacillus 
phage W(B predicted regions were allocated at 
contigs 34, 8, 15, 18 and 37, respectively. Only 
lysis proteins were predicted in phages C5 and c- 
st regions. The only genes remaining in the phage 
cpl05 region are those for coat proteins, integrase, 



and hypothetical and phage-like coding sequences. 
This is probably the remnant of phage invasion 
and genome deterioration during evolution. In 
addition, any previously reported phages in the 
genome of L. sphaericus C3-41 are in the genome 
ofOT4b.31. 

Two elements contain conserved domains from 
the Listeria pathogenicity island LIPI-1, functional- 
ly assigned as a thiol-activated cytolysin and a 
phosphatidylinositol phospholipase C. The first 
was confirmed to correspond to the L. sphaericus 
B354 sphaericolysin coding gene in contig 18 
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(H131_12483]. Sphaericolysin B354 has been re- 
ported to be widespread across L. sphaericus DNA 
homology groups not only including IIA, IIB, IV 
and V [56] but also non-grouped species such as 
0T4b.31. Upstream, in the same contig, a Bacillus 
toxin from the family Mtx2 (PFam PF03318) was 
found and described as a hypothetical SiplA toxin 
coding sequence (H131_12498]. Purified from Ba- 
cillus thuringiensis strain EG2158, SiplA is a se- 
creted insecticidal protein of 38 KDa having activi- 
ty against Colorado Potato beetle [Leptinotarsa 
decemlineatd) [57]. Considering that L. sphaericus 
0T4b.31 was isolated from beetle larvae, we sug- 
gest potential coleopteran larvicidal activity. To 
our knowledge, strain 0T4b.31 is the first report 
of a predicted SiplA-like toxin in a native 
Lysinibacillus sphaericus. Unexpectedly, mtx or bin 
mosquito pathogenic genes were not found in the 
0T4b.31 genome, despite a previous report show- 
ing positive evidence of BinA/B toxins with no 
larvicidal activity [10]. 

A total of 32 CDSs were described as surface (S] 
layer proteins or S-layer homologs (SLH]. The pu- 
tative S-layer gene sllB (H131_05299] previously 
reported in L. sphaericus JG-A12 [58] was found in 
a 3,696 bp sequence allocated in contig 8. Three 
sequences with conserved domains similar to Slp5 
and Slp6 were identified in contigs 8 
(H131_05339, H131_05344] and 22 
(H131J.6838). Bacillus sp. B-14905 was the most 
similar sequence for the majority of S-layer pro- 
tein domains. In addition, a putative glycoprotein 
(H131_22117], a bifunctional periplasmic precur- 
sor (H131_05993] and an S-layer fusion 
(H131_05409] coding sequence associated with S- 
layer proteins were recognized. On the other 
hand, a cluster of spore germination genes were 
determined near the termination of the replication 
site (including genes from the ger and ype oper- 
ons] among other genes widespread in the ge- 
nome. Three clusters of sporulation genes were 
allocated at contigs 1, 10 and 13 (including genes 
from spoil, spoV,yaa and sig operons]. 

Responses against toxic metal(oid]s in L. 
sphaericus OT4b.31 could be controlled by efflux 
pumps related genes in clusters found in contigs. 
Putative coding sequence order is as follows: 
yozA^czcD^csoR^copZA (contig 1, H131_00045: 
H131_00065]; nikABC^oppD^nikD (contig 17, 
H131_11103:H131_11123]; cadC-like^cadA 
(contig 24, H131_17086:H131_17081); arsRBC - 
putative extracellular secreted protein CDS - arsR- 



like^czrs/Mike— > putative excinuclease CDS 
(contig 18, H131_11998:H131_12028]. The func- 
tion of YozA is still unknown [59], but is similar to 
CzrA and CadC belonging to the ArsR transcrip- 
tional family regulators. YozA, CsoR (from the 
copper-sensitive operon], CadC-like and ArsR pro- 
teins seem to be the direct regulators of each clus- 
ter. At least one additional copy of ChrA, CzrB and 
CzcD CDSs were found. Upstream the nik cluster, 
we could not find transcriptional regulators. In 
summary, L. sphaericus OT4b.31 has protein en- 
coding sequences probably involved in the re- 
sistance against Cd, Zn, Co, Cu, Ni, Cr, and As. In 
fact, prior reports of resistance to toxic metals 
[16,17] in L. sphaericus OT4b.31 may be explained 
due to participation of heavy-metal resistance 
proteins. 

Strain OT4b.31 probably has a diverse defense 
repertoire according to the following responses 
and predicted genes: bacitracin stress responses, 
genes bceBASR and yvcPQRS; multidrug resistance, 
MATE (multidrug and toxin extrusion] family ef- 
flux pump genes ydhE/norM and acrB; antibiotics 
resistance, genes vanRSW, tetP-like group II, fusA 
(elongation factor G), fosB, blaZ and ampC-like. 
Based in the KEGG analysis, some predicted pro- 
teins might participate in peripheral pathways for 
the degradation of benzoate, aminobenzoate, 
quinate, toluene, naphthalene, geraniol, limonene, 
pinene, chloroalkane, chloroalkene, styrene, 
ethilbenzene, caprolactam and atrazine com- 
pounds, and biosynthesis of streptomycin, 
novobiocin, zeatin, ansamycins, penicillin and 
cephalosporins. 

Conclusions 

The native Colombian strain Lysinibacillus 
sphaericus OT4b.31, isolated from beetle larvae, is 
classified between DNA similarity groups III and 
IV. A comparison of the chromosomal sequences 
of strain OT4b.31 and its closest complete genome 
sequence, L. sphaericus C3-41, demonstrates the 
presence of only a few similar regions with 
syntenial rearrangements, and no prophage or 
putative mosquitocidal toxins are shared. 
Sphaericolysin B354 and the coleopteran toxin 
SiplA were predicted in the strain OT4b.31, a 
finding which may be useful not only in bioreme- 
diation of polluted environments, but also for bio- 
logical control of agricultural pests. Finally, Cd, Zn, 
Co, Cu, Ni, Cr and As resistances probably are sup- 
ported by efflux pumps genes. 
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